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(57) Asynchronous network interface and method of synchronisation between two applications, or an 
application and remote hardware, on different computers is provided. The network interface contains 
snooping hardware which can be programmed to contain triggering values comprising either addresses, 
address ranges, or other data which can be matched, called "trip wires". The interface monitors the data 
stream passing through it for addresses and data matching these trip wires. On a match, it can generate 
interrupts for increment event counters, or some other application-based action. This snooping hardware is 
preferably based upon Content-Addressable Memory. 

Also disclosed are methods of transferring data using a buffer with pointers to read and write areas, 
which can be updated over the network, and means for marshalling and unmarshalling data streams using 
software stubs. Destination addresses of data in a burst can be deduced from its position in a burst, and a 
burst can be split in two during transmission. 
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Fig 15. Client-server intemtioa using i NIC and a hardware data source 



At least one drawing originally filed was informal and the print reproduced here is taken from a later filed formal copy. 
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LOW LATENCY NETWORK 

5 This invention, in its various aspects, relates to the field 
of asynchronous networking, and specifically to; a memory 
mapped network interface; a method of synchronising between 
a sending application, rxinning on a first computer, and a 
receiving application, running on a second computer, the 
10 computers each having a memory mapped network interface; a 
communication protocol; and a computer network. 

Due to a number of reasons, traditional networks, such as 
Gigabit Ethernet, ATM, etc., have not been able to deliver 
high bandwidth and low latency to applications that require 
them. A traditional network is shown in Fig. l. To move 
data from computer 20 0 to another computer 201 over a 
network, the Central Processing Unit (CPU) 2 02 writes data 
from memory 204 through its system controller 206 to its 
Network Interface Card (NIC) 210. Alternatively, data may be 
transferred to the NIC 210 using Direct Memory Access (DMA) 
hardware 212 or 214. The NIC 210 takes the data and forms 
network packets 216, which contain enough information to 
allow them to be routed across the network 218 to computer 
system 2 01. 

When a network packet arrives at the NIC 211, it must be 
demultiplexed to determine where the data needs to be 
placed. In traditional networks this must be done by the 
30 operating system. The incoming packet therefore generates 
an interrupt 2 07, which causes software, a device driver in 
operating system 2 09, to run. The device driver examines the 
header information of each incoming network packet 216 and 



20 



determines the correct location in memory 205, for data 
contained within the network packet. The data is transferred 
into memory using the CPU 203 or DMA hardware (not shown) . 
The driver may then request that operating system 209 
reschedule any application process that is blocked waiting 
for this data to arrive. Thus there is a direct sequence 
from the arrival of incoming packets to the scheduling of 
the receiving application. These networks therefore provide 
implicit synchronisation between sending and receiving 
applications and are called synchronous networks. 

It is difficult to achieve optimum performance using modem 
synchronous network hardware. One reason is that the number 

Ko r.-ronessed increases as packets 

of interrupTis ciidu .x«v^ — 

are transmitted at a higher rate. Each interrupt requires 
that the operating system is invoked and software xs 
executed for each packet. Such overheads both increase 
latency and the data transfer size threshold at which the 
maximum network bandwidth is achieved - 

^hese observations have led to the development of 
asynchronous networks. In asynchronous networks, the fxnal 
memory location within the receiving computer for recexved 
data can be computed by the receiving NIC from the header 
information of a received network packet. This computation 
can be done without the aid of the operating system. 

Hence, in asynchronous networks there is no need to generate 
a system interrupt on the arrival of incoming data packets^ 
.synchronous networks therefore have the potential of 
delivering high bandwidth and low latency,- much greater than 
synchronous networks. The Virtual Interface Architecture 
(VIA) is emerging as a standard for asynchronous networking. 
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Memory -mapped networks are one example of asynchronous 
networks. An early computer network using memory mapping is 
described in US patent No. 4,393,443. 

5 A memory-mapped network is shown in Fig. 2. Application 222 
running on Computer 220 would like to communicate with 
application 223 running on Computer 221 using network 224. 
A portion of the application 222 's memory address space is 
mapped using the computer 220 ' s virtual memory system onto 

10 a memory aperture of the NIC 226 as shown by the 
application's page- tables 228 (these page- tables and their 
use is well known in the art) . Likewise, a portion of 
application 223 's memory address space is mapped using 
computer 221 's virtual memory system onto a memory aperture 

15 of the NIC 229 using the application 223 's page-tables 231. 

Software is usually required to create these mappings, but 
once they have been made, data transfer to and from a remote 
machine can be achieved using a CPU read or write 
instruction to a mapped virtual memory address. 

20 

If application 222 were to issue a number of processor write 
instructions to this part of its address space, the virtual 
memory and I/O controllers of computer 220 will ensure that 
these write instructions are captured by the memory aperture 
25 of the NIC 226, NIC 22 6, determines the address of the 
destination computer 221 and the address of the remote 
memory aperture 225 within that computer. Some combination 
of this address information can be regarded as the network 
address, which is the target of the write. 

30 

All the aperture mappings and network address translations 
are calculated at the time that the connection between the 
address spaces of computers 220 and 221 is made. The process 



of address looKups and translacions at each stage in the 
system can be carried out using hardware. 

After receiving a write, NIC 226 creates network packets 
.sing its packetisation engine 230. These packets are 
forwarded to the destination computer 22X. At the 
destination, the memory aperture addresses of the inco..ng 
packets are remapped by the packet handler onto physical 
.en^ory locations 227. The destination NIC 229 then writes 
.he incoming data to these physical memory locations 227. 
This physical memory has also been mapped at connection set- 
.p time into the address space of application 223. Hence 
application 223 is able, using page-tables 23X and the 

^o ;.ccess the data using processor 

virtuaj. memory -^j-- 

read and write operations. 

commercial equiprnent for building memory-mapped networks is 
available from a nunO^er of vendors, including Do ph.n 
interconnect Solutions. Industry standards, such as Scalable 
coherent Interface (SCI) (IEEE Standard 159S-1952) , have 
been defined for building memory mapped networks, and 
implementations to the standards are currently available. 

SCI is an example of an asynchronous network standard, which 
provides poor facilities for synchronisation at the txme of 
data reception. A network using SCI is disclosed xn US 
patent No. 5,819,075. Figure 3 shows an example of an SC - 
like network, where application 242 on computer 240 would 
like to communicate with application 243 on computer 241. 
I,et us suppose that application 243 has blocked waiting for 
the data. Application 242 transmits data using the methods 
described above. After sending the data, application 
.ust then construct a synchronisation packet in local 
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memory, and program the event generator 244, in NIC 246, to 
send the synchronisation packet 24 8, to the destination 
node . 

5 On receiving synchronisation packet 248, the NIC 245 on 
computer 241, invokes its event handler 247, which generates 
an interrupt 249 allowing the operating system 248 to 
determine that application 243 is blocked and should be 
woken up. This is called out-of-band synchronisation since 
10 the synchronisation packet must be treated as a separate and 

distinct entity and not as part of the data stream. Out-of- 
band synchronisation greatly reduces the potential of 
memory-mapped networks to provide high bandwidth and low 
latency. 

15 

In other existing asynchronous networks, such as the newly 
emerging Virtual Interface Architecture (VIA) standard, some 
support is provided for synchronisation. A NIC will raise a 
hardware interrupt when some data has arrived. However, the 
20 interrupt does not identify the recipient of the data, 

instead only indicates that some data has arrived for some 
communicating end-point. 

While delivery of data can be achieved solely by hardware, 
25 the software task of scheduling between a large number of 

applications, each handling received data, becomes difficult 
to achieve. Software, known as a device driver, is required 
to examine a large number of memory locations to determine 
which applications have received data. It must then notify 
30 such applications that data has been delivered to them. This 

might include a reschedule request to the operating system 
for the relevant applications. 



The pres.« invention, in its various aspects, is defined in 
„ore det.il in the appended claims to which reference should 

now be made, 

^ first aspect: of the ir^vention provides a method of 
synchronising between a sending application on a f.rs. 
c'puter and a receiving application on a second compute, 
each computer having a main memory, and at least one o. he 
computers having an asynchronous network xnterface, 

comprising the steps of: , ^v, ^ set 

providing the asynchronous network interface w.th a set 

of rules for directing incoming data to memory locations .n 

the main memory of the second computer ; 

. ^ ^^^...1:ace one or more triggering 

storing in cue ri.<=o..w,^„ 

value (s,, each triggering value representing a state of a 
data transfer between the applications, 

receiving, at the networ. interface, a data strea. 
being transferred between the applications, 

co^aring at least part of the data strea. received 
„,,n the stored triggering values^ 

if the compared part of the data stream 
stored triggering value, indicating that the triggering 
value has been matched; and 

storing the data received in the main n,en,ory of 
second computer at one or .ore ^^ory location(s, m 
accordance with the said rules. 

;.other aspect of the invention provides an ^ 
networ. interface for use in a host Computer — ^ 
„e.ory and connected to a network, the interface compr.s.ng^ 
™an. for storing a set of rules for directing _g 

data to t»«ory locations in the ^in memory of the host 

computer ; 
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a memory for storing one or more triggering value (s), 
each value representing a state of a data transfer between 
two or more applications in the computer network; 

a receiver for receiving a data stream being 
5 transferred between two or more applications in the computer 

networks- 
comparison means for comparing at least part of the 
data stream received by the network interface with the 
stored triggering values; and 
10 a memory for storing information identifying any 

matched triggering values. 

A further aspect of the invention provides a method of 
passing data between an application on a first computer and 
15 remote hardware within a second computer or on a passive 

backplane, the first computer having a main memory and an 
asynchronous network interface, the method comprising the 
steps of; 

providing the asynchronous network interface with a set 
20 of rules for directing incoming data to - memory or I/O 
location (s) of the remote hardware; 

storing in the network interface one or more triggering 
value (s), each triggering value representing a state of a 
data transfer between the application and the hardware; 
,25 receiving, at the network interface, a data stream 

being transferred between the application and the hardware; 

comparing at least part of the data stream received 
with the stored triggering value (s); 

indicating that a triggering value has been matched, if 
30 any compared part of the data stream matches a triggering 
value; 
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storing data transmitted in memory or I/O location(s) 
of the remote hardware in accordance with the said r^les; 



and 



scoring ch. data received in the ™in memory ot the 
computer at one or more me»ory location,., in accordance 
With the said rules. 

. ..rther aspect of the invention provides a method of 

arranging data transfers fro. one or more applications on 

computer, the computer having a main memory, an asynchronous 

Ltwor. interface, and a Oirect Memory .ccess COM.) eng.n 

Having a request queue address common to all 

applications, comprising the steps of: ^ ^„ 

. ^.v,^ network interface to 

the application i.csi,vj='='^-"3 

• store a triggering value corresponding to a property of the 
data block to be transferred; 

an application requesting the DMA engine to transfer a 

block of data; ^^-,1,1^ 
the networx interface storing a triggering value 

corresponding to a property of the .^-Jj^ 
transferred, along with an identification of the appUcatron 
which requested the DMA transfer, 

the networ. interface monitoring the data stream .erng 
sent hy the applications ^ comparing at least part of the 
aata stream with the triggering value (s, stored in 
memory; and 

if any triggering value matches, indicating that that 
triggering valiie has matched. 

A yet further aspect of the invention provides a method of 
transferring data from a sending application on a firs 
computer to a receiving application on a second computer. 
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each computer having a main memory, and a memory mapped 
network interface, the method comprising the steps of: 

creating a buffer in the main memory of the second 
computer for storing data being transferred as well as data 
5 identifying one or more pointer memory location (s); 

storing at said pointer memory location (s) at least one 
write pointer and at least one read pointer for indicating 
those areas of the buffer available for writes and for 
reads ; 

10 in dependence on the values of the WRP(s) and RDP(s), 

the sender application writing to the buffer ; 

updating the value of the WDP(s), after a write has 

taken place, to update the indication of the areas of the 

buffer available for reads and writes ; 
15 in dependence on the values of WRP(s) and RDP(s), the 

receiver application reading from the buffer; and 

updating the value of the RDP(s), after a read has 

taken place, to update the indication of the areas of the 

buffer available for reads and writes. 

20 

Another aspect of the invention provides a computer network 
comprising two computers, the first computer running a 
sending application and the second computer running a 
receiving application, each computer having a main memory 

25 and a memory mapped network interface, the main memory of 
the second computer having: a buffer for storing data being 
transferred between computers as well as data identifying 
one or more pointer memory location (s) ; 

means for reading at least one write pointer (WRP) and 

30 at least one read pointer (RDP) stored at (a) pointer memory 
location(s), for indicating those areas of the buffer 
available . for writes and those areas available for reads; 



the 



10 

network interface of the second computer 



comprising : 

a memory mapping; 

means for reading data from the buffer in accordance 
with the contents of the WRP(s) and RDP(s) ; and 

means for updating the value of . the RDP(s), after a 
read has taken place, to update the indication of the areas 
of the buffer available for reads and writes. 

A further aspect of the invention provides a method of 
sending a request from a client application on a f.rst 
computer to a server application on a second computer and 
sending a response from the server application to the cl.ent 

. r.a a main memory and a 

application, oocxi ^^^^^e^'.^-- 

capped network int.rf.c, ™th.d cc^pr.a.n. 

Steps of: j= 

(A) providing a buffer in the main memory of each 

computer; 

(B) the client application, providing software stubs 
Which produce a marshalled stream of data representing the 

request, n.^-t.^„n «;endinq the marshalled 

(C) the client application senaing 

stream of data to the server's buffer; 

(P) the server application unmarshalling the stream of 
data by providing software stubs which convert the 
marshalled stream of data into a representation of the 
request in the server's main memory; 

(E) the server application processing the request and 

generating a response; 

(F) the server application providing software stubs 
which produce a marshalled stream of data representing the 

response ; 
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(G) the server application sending the marshalled 
stream of data to the client's buffer; and 

(H) the client application unmarshalling the received 
stream of data by providing software stubs which convert the 

5 received marshalled stream of data into a representation of 
the response in the client's main memory. 

Another aspect of the invention provides a method of 
arranging data for transfer as a data burst over a computer 

10 network comprising the steps of: providing a header 
comprising the destination address of a certain data word in 
the data burst, and a signal at the beginning or end of the 
data burst for indicating the start or end of the burst, the 
destination addresses of other words in the data burst being 

15 inferrable from the address in the header. 

A further aspect of the invention provides a method of 
processing a data burst received over a computer network 
comprising the steps of: 
20 reading a reference address from the header of the data 

burst, and 

calculating the addresses of each data word in the 
burst from the position of that data word in the burst in 
relation to the position of the data word to which the 
25 address in the header corresponds, and from the reference 
address read from the header. 

Another aspect of the invention provides a method of 
interrupting transfer of a data burst over a computer 
30 network comprising the steps of: 

halting transfer of a portion of the data burst which 
has not yet been transferred, thereby splitting the data 
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into two .u.st sections, one w^ch is ..^sf«red, a„. 
one witlng to be transferred. 

further aspect of the invention provides a method of 
restarting the transfer of a data hnrst, after the transfer 
data burst has been interrupted, the method 

co»I.rislng the steps of: ^ 

calculating a new reference address for the 
untransferred data burst section fro. the address contained 
in the header of the whole data burst, and fro„ » 
i„ ,he Whole data burst of the first data word of the 
un^ansferred data burst section in relation to the positr™ 
of the data word to which the address .n the header 

corresponds? v^rst 

providing a new header for the untransf erred data 
section comprising the new reference address; and 
Transmitting the new .header aXong with the ^transferred 
data burst section. 

^he first aspect of the present invention addresses the 
s^chronisation proble™ for ^.or. ^^^^ 
i«erfaoes. The present invention uses a networ. 
containing snooping hardware which can - J 

contain triggering /--^^^^r be :::::: 

anges or other data whxcn are t.^ 
dat's are' termed .Tripwires- . Once progr«™ed the 
interface monitors the data stre.», including address data 

Sing through the interface for addresses and data whrch 
L.ch the Tripwires which have been set. On a -natch, the 
match cne r ^„„ms or increment event 

snooping hardware can generate interrupts or 
counters, or perform some other application specified 
:r n. This snooping hardware is preferably based upon 
content addressable Memory ,C», . references h.rern to the 
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"data stream" refer to the stream of data words being 
transferred and to the address data accompanying them. 



The invention thus provides in-band synchronisation by using 
5 synchronisation primitives which are programmable by user 

level applications, while still delivering high bandwidth 
and low latency. The' programming of the synchronisation 
primitives can be made by the sending and receiving 
applications independently of each other and no 
10 synchronisation information is required to traverse the 
network . 

A number of different interfaces between the network 
interface and an application can be supported. These 

15 interfaces include VIA and the forthcoming Next Generation 

Input/Output (NGIO) standard. An interface can be chosen to 
best match an application's requirements, and changed as its 
requirements change. The network interface of the present 
invention can support a number of such interfaces 

20 simultaneously. 

The Tripwire facility supports the monitoring of outgoing 
as well as incoming data streams. These Tripwires can be 
used to inform a sending application that its DMA send 
25 operations have completed or are about to complete. 

Memory-Mapped network interfaces also have the potential to 
be used for communication between hardware entities. This is 
because memory mapped network interfaces are able to pass 
30 arbitrary memory bus cycles over the network. As shown in 
Fig. 4, it is possible to set up a memory aperture 254, in 
the NIC 252 of Computer 250, which is directly mapped via 
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3S,, o„co ,n .aares. region 257 of the I/O bus 2S3 o, 
passive backplane 251. 

„,i„, listing «e™c.y ».ppea lnt«f.c... such » D.C «e»ory 
« Dolphin SCI, «i appiioation n»ning on Co^ucer 
.SO, Which «<^i.es u.e o. the h..aw.e aevice .s. wouia 
. ,u.u.Xly .o«t«a«, process to interface between 
rtTeU .na the Ketwor. Interface card ,»IC, 2S2 . Thrs i 
ecause the «c .sa, »ouia not appear at the 
i„ computer 2=0 as an instance of the re».te hardware dev.c. 
Zs hut instead as . networ. card which has a »e„orv 
aperture 254 «»pped onto the hardware device. 

. of the invention, we have appreciated 

In a r ux ^^^^^ 

,he interface of the present invention can be 

ro^r.»»ea to present the sa^ hardware - 
re^ote hardware device ^ 
level in computer 250 to be an instance 
^rdware device. If the networ. card 252 were an Interfa 
.ccording to the present invention, so progra™,ed the 
relte Ldwar. device 255 would appear as phvsic. 
,„„,.d Within cc^uter 250, in a ^er 

software. The hardware device 255, is able to be physically 
iccatea both at the re.ote ena of a dedicated lin., or over 
a eneral network. The invention will support both gener 
JtworXlng activity and re.ote hardware co^unrc.tron 
simultaneously on a single network card. 

aether aspect of the invention relates to a Un.-le^el 

co»»nication protocol which can be used to 

chrough routing and forwarding. There is no need for .„ 

entity supporting the co«,unic.tlon protocol, before data 
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transmission can be started on an outgoing link. The 
invention also allows large bursts of data to be handled 
effectively without the need for a small physical network 
packet size such as that employed by an ATM network, it 
5 being passible to dynamically stop and restart a burst and 
regenerate all address information using hardware. 

A preferred embodiment of the various aspects of the 
invention will now be described with reference to the 
10 drawings in which: 

Figure 5 shows two or more computers connected by an 
embodiment of the present invention, using Network Interface 
Cards (NICs) ; 

15 Figure 6 shows in detail the various functional blocks 
comprising the NICs of Figure 5; 

Figure 7 shows the functional blocks of the NIC loyed within 
a Field Programmaible Gate Array (FPGA) ; 

Figures 8 and 8e shows the communication protocol used in 

20 one embodiment of the invention; 

Figure 9 shows schematically hardware communication 
according to an embodiment of the inventions- 
Figure 10 shows schematically a circular buffer abstraction 
according to one embodiment of the invention; 

25 Figure 11 shows schematically the system support for 
discrete message communication using circular buffers; 
Figure 12 shows a client -server interaction according to an 
embodiment of the invention; 

Figure 13 shows how the system of the present invention can 
30 support VIA; 

Figure 14 shows outgoing stream synchronisation according to 
an embodiment of the present invention; and 
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Pioure 15 Shows a cllenn-server interaction according to an 
„«,odi»e„t Of the invention using , hardware data source. 

Referring to Kigure 5, co^uters 1, 2 use the present 
invention to exchange data. A plurality of other con^uters 
such as 3, .-ay participate in the data exchange if connected 
Via optional network switch 4. 

..Ch co^ucer i. . is co^os.d of a microprocessor central 
processing m.it S,S,, « e.so, local cache »e„cry ..S. 
and system controller s,se. The system controller S,S= 
interacts with its microprocessor s.=, to allow the 
microprocessor to exchange data with devices attached to X o 
^ . . o standard perxpherals, 

bus 9. Attacnea co x/ ^ — 

uch as a Video adapter iO . .Iso attached tc X/O hus . 
is one or more network interfaces, in the form of NXCS 
XX se which represent an embodiment of this invention. xn 

r^^^T^r^^-rc\ PCI bus conforming 
computers X, 2 the X/O bus xs a standard 

CO PCX Local .us specification. Rev. 2.X, although any other 
bus capable of supporting bus master operations can be used 
with suitable modification of system Controller peripherals, 
such as Video card XO, and the interface to KXC XX, 5.. 

Referring to Figure s, each mc con^rises . memory is X9 

20 for storing triggering values, a rec.xver XS for 

receiving a data stream, a cc^arator for comprising part o 

the data stream with the triggering values and a memory a 

for storing information which will identify ^tched 

triggering values. More specifically, in the preferred 

triggering Local Bus 

embodiment each «IC 56, IX is composed of a 

bridge X., a control .ield Programmable cate «ray 

X3, transmit <Tx, serialiser X., fibre-optic cr^iscexver 5^ 

re eive ,Hx, de-serialiser X., address multiplexer and latch 
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17, CAM array 18, 19, 20, boot ROMs 21 and 22, static RAM 
23, FLASH ROM 24, and clock generator and buffer 25, 26. 
Figure 6 also shows examples of known chips which could be 
used for each conrponent, for example boot ROM 21 could be an 
5 Altera EPCl chip. 

Referring to Figure 7, FPGA 13' is comprised of functional 
blocks 27-62. The working of the blocks will be explained 
by reference to typical data flows. 

Operation of NIC 11 begins by computer 1 being started or 
reset. This operation causes the contents of boot ROM 21 to 
be loaded into FPGA 13 thereby programming the FPGA and, in 
turn, causing state machines 28, 37, 40, 43, 45, 46 and 47 
to be reset . 

Clock generator 25 begins running .and provides a stable 
clock for the Tx serialiser 14. Clock buffer/divider 26 
provides suitable clocks for the rest of the system. 
20 Serialiser 14 and de- serialiser 16 are reset and remain in 
a reset condition until comm\inication with another node is 
established and a satisfactory receive clock is regenerated 
by de-serialiser 16. 

25 PCI bridge 12 is also reset and loaded with the contents of 

boot ROM 22. Bridge 12 can convert (and re-convert at the 
target end) memory access cycles into I/O cycles and support 
legacy memory apertures, and as the rest of the NIC supports 
byte-enabled (byte-wide as well as word-wide) transfers, ROM 

30 22 can be loaded with any PCI configuration space 

information, and can thus emulate any desired PCI card 
transparently to microprocessor 5. 
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x™e...«.v ,»e. «s.c, .^SH control s.ate 

=„a sue. - in ' " - 

P.e,..™.in9 o< t.. .^H „e„o.y i. also .an.le. .y state 
^<,ii„e 47 in c»nj,m=tion with brldg. 12 . 

oata t„n=.a. c».ia in p.l.=iP- co»e„=a at t.ls p.i»t .u. 
„^,,„ „ is bar«a fr»> granting bus access 
,tate machine 37 until a status bit Ms bean set xn on, 
:rinte«.l «giste„ This allows so,twa„ to sat up 

the Tripwires during th. initialisation stage. 

„,it,. fro„ oo-puter 1 to computer . ta.e place in the 

c ,^r-lt-es one or more words 
following manner. Microprocessor 5 writes 

to an address location defined by system ^'"'"^ 
Within KIC ivs address space. PCX to local bus bridge la 
captures these writes and turns thea into local bus protocol 
capi^ui.t=o writes are 

(discussed elsewhere in this document) . If wr 
within the portion of the address space determxned to he 
within cne y register 
within the local control aperture of the NIC J ^ 
decode 4S, the. the writes talte place locally to the Content 
Xssable Memory appropriate register, <C»> , Static ^ 
's,^. or memory area, otherwise target state machine 

Tl ims the cycles and forwards them to protocol encoder 
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« the protocol encoder, byte-enable, parity data «d 
Itrol info^ation are added first to an address and then 
..ch word to b. transferred in a burst, with a control 
t marKing the beginning of the burst and P--"^ ' - 
control bit marking the end of the burst. The control brt 
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marking the beginning of the burst indicates that address 
data forming the header of the data burst comprises the 
first "data" word of Che burst. Xon/Xoff -style management 
bits from block 31 are also added here. This protocol, 
5 specific to the serialiser 14 and de- serialiser 16 is also 

discussed elsewhere in this document. 

Data is fed on from encoder 29 to output multiplexer 30, 
reducing the pin count for FPGA 13 and matching the bus 

10 width provided by serialiser 14. Serialiser 14 converts a 
23-bit parallel data stream at 62MHz to a 1-bit data stream 
at approximately 1.5Gbit/s; this is converted to an optical 
signal by transceiver 15 and carried over a fibre-optic link 
to a corresponding transceiver 15 in NIC 56, part of 

15 computer 2. It should be noted that other physical layers 

and protocols are possible and do not limit the scope of the 
invention. 

In NIC 56, the reconstructed digital signal is clock- 
20 recovered and de-serialised to 62MHz by block 16. Block 32 
expands the recovered 23 bits to 4 6 bits, reversing the 
action of block 30. Protocol decoder 33 checks that the 
incoming words have suitable sequences of control bits. If 
so, it passes address/data streams into command FIFO 34. If 
25. the streams have errors, they are passed into error FIFO 35; 

master state machine 37 is stopped; and an interrupt is 
raised on microprocessor 57 by block 53 . Software is then 
used to decipher the incoming stream until a correct 
sequence is fo\md, whereupon state machine 37 is restarted. 
30 When a stream arrives at the head of FIFO 34, master state 
machine 37 requests access to local bus 55 from arbiter 40. 
When granted, it passes first the address, then the 
following data onto local bus 55. Bridge 12 reacts to this 
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address/data stream by requesting access to I/O bus 59 from 
3y3tam controller 58. When granted, it writes the required 

data into memory 60. 

Reads of computer 2's memory 60 initiated by computer 1 take 
place in a similar manner. However, state machine 28 after 
sending the address word sends no other words, rather xt 
waits for return data. Data is returned because master 
state machine 37 in NIC 56 reacts to the arrival of a read 
address by requesting a read of memory 60 via X/O bus 59 and 
corresponding local bus bridge ^2 . This data is returned as 
if it were write data flowing from NIC 56 to NIC ll, but 
without an initial address. Protocol decoder 33 reacts to 

. _ .... V.,, ^^„t-^r,fT it to read return FIFO 36, 
this address J. ess vaa^c ^j, = - ^ 

whereupon state machine 28 is released from its waxt and the 

c;.^ read cycle is allowed to complete, 
microprocessor 5's read cycie 

Should the address region be marked in NIC 56 ■ s br.dge 12 as 
read-prefetchable, then a number of words are returned; xf 
. state machine 28 continues requesting data as if from a 
local bus burst read, then subsequent words are fulfilled 
directly from read return FIFO 35. 

Should NIC 56 need to raise an interrupt on microprocessor 
5 remote interrupt generator 54 causes state machine 28 to 
slnd a word from NIC 56 to a mailbox register in NIC ii's 
hridge 12. This will have" been configured by software to 
raise an interrupt on microprocessor 5. 

inevitably, since the clocks 25 in NICs XX and 56 will run 
at Slightly different frequencies, there will be occasxonal 
overrun conditions. Where the command FIFO 34 exceeds a 
p.e-programmed threshold value, an Xoff bit is sent to the 
corresponding protocol encoder 29. This bit causes the 
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encoder to request that the sending state machine 28 stops, 
if necessary in mid burst. Logic in bridge 12 takes care of 
restarting the data burst when the corresponding Xon is 
received some time later. This logic calculates a new 
5 reference address for the unsent part of the data burst, 

using the reference address in the header of the whole data 
burst, and from a count of the nximber of data words which 
are sent before the transfer is stopped. As, in this 
embodiment, successive data words in a burst have 
10 successively incrementing destination addresses, the 
destination address of the first data word in the unsent 
part of the data burst can easily be calculated. 

It is also possible that data may be read out of FIFO 34 
15 faster than it is written in. In the event of this 

happening, master state machine 37 uses pipeline delay 38 to 
anticipate the draining of FIFO 34 and to terminate the data 
burst on local bus 55 . It then uses the CAM address 
latch/counter 41 to restart the burst when more data arrives 
20 in FIFO 34. 

'Tripwires' are triggering values, such as addresses, 
address ranges or other data, that are programmed into the 
NIC to be matched. Preferably, the trigging values used as 
25 tripwires are addresses. To meet timing requirements during 

address match cycles (as data flows through the NIC) , three 
CAM devices are pipelined to reduce the match cycle time 
from around 70 nanoseconds to less than 30 nanoseconds. 

30 The programming of Tripwires takes place by microprocessor 

5 writing to PCI bridge 12 via system controller 8 and I/O 
bus 9. For the purpose of writing the Tripwire data, CAM 
array 18, 19, 20 appears like conventional RAM to 
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microprocessor s. .or writs cycle., .his is .o„ by C» 
controller 43 generating snitei,le control signals to enable 
all three cms is, 13, 20 for write access. Adaress latch, 
4, passes data to the cms u„»oal£ied. Address «.ltiplexer 
.i Is arranged to pass local hus data out on the cm address 
.us Where it is latched at the »on,ent addresses are valid on 
,ne local hus hy l.tch 1,. For read cycles, the process .s 
Similar, except that only CA„ 1= Is arranged to he enahl d 
tor read access, and address latch/counter .4 has .ts data 
£low direction reversed. So far as microprocessor .s 
concerned, it sees the ejected data returned, s.nce the 
„e»ory arrays in CAKs IS, 1,, .0 either contain the sa»e 
aata or internal flags indicating that particular segments 

■ not vet heen written «>d should not 

of tne memtj-i-jr u 
participate in tnatch cycles. 

^i„, „ the nature of the address/data hus heing comprised 
bursts Of data, according to the preferred local 
protocol, the actual data stream cannot be used for 
Storing address changes. A burst starts with the address 
Of the first data word followed by an arbitrary nurr^r of 
d!t. words. The address of the data words is 
increments from the start address. For normal inbound or 
TtLund data transfer operations, address --V=oun"r 
is loaded with the address of each new data burst, and 
incremented each time a valid data item is presente on 
internal local bus SS. CAM control state machine 43 iS 
.rranged to enable each CAM 13, 1., .0 in seguence for a 
compare operation as each new address is output^y 
latch/counter 44. .his se^ential — ^ ^^^^ 
combined with their latching properties permits the .cces 
time for a comparison operation to be reduced by a act 
three .there being three CAMS in this in^lementation. 
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implementations being possible) from 70ns to less than 30ns. 
The CAM op -code for each comparison operation is output from 
one of the internal registers 49 via address multiplexers 41 
and 17. The op-code is actually latched by address 
5 multiplexer 17 at the end of a read/write cycle, freeing the 
CAM address bus to return the index of matched Tripwires 
after comparison operations. 

The Tripwire data (i.e. the addresses to be monitored) is 
10 written to sequential addresses in the CAM array. During 
the comparison operation (cycle) , all valid Tripwires are 
compared in parallel with the address of the current data, 
be it inbound or outbound. During the operation, masking 
operations may be performed, depending on the type of CAM 
15 used, allowing certain bits of the address to be ignored 
during the comparison. In this way, a Tripwire may actually 
represent a range of addresses rather than one particular 
address . 

20 When the CAM array signals a match found (i.e. a Tripwire 

has been hit) , it returns the address of the Tripwire (its 
offset in the CAM array) via the CAM address bus to the 
tripwire FIFO 42. Two courses of action are then possible, 
depending on how internal registers 4 9 have been programmed. 

25 

One course of action is for state machine 4 5 to request that 
an interrupt be generated by management logic 53. In this 
case, an interrupt is received by microprocessor 5, and 
software is run which services the interrupt. Normally this 
30 would involve microprocessor 5 reading the Tripwire address 

from FIFO 42, matching the address with a device-driver 
table, signalling the appropriate process, marking it 
runnable and rescheduling. 
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^ alternative course of action is for state machine 45 to 
cause records to be read from SRAM 23 using state machxne 
46 A record comprises a number of data words; an address 
and two data words. Ttiese words are programmed by the 
software just before the Tripwire information is stored in 
the CAM. When a Tripwire match is made, the address in 
LATCH 44 is left shifted by two to form an address index for 
SRAM 23. The first word is then read by state machine 46 
and placed on local bus 55 as an address in memory 6. A 
fetch-and-increment operation is then performed by state 
machine 45, using the second and third words of the SRAM 
record to first AND and then OR. or else INCREMENT the data 
referred to in memory 6 . A bit in the first word read by the 

....T, ..r,ir-;,t-.e which operation it should take. 

St ace macxixxi5= ^-^^^ 

in the case of an INCREMENT, the first data word also 
indicates the amount to increment by. 

These alternatives enable the implementation of such 
primitives as an event counter incremented on tripw.re 
matches, or the setting of a system reschedule flag. Th.s 
mechanism enables multiple applications to process data 
without the requirement for hardware interrupts to be 
generated after receipt of each network packet. 

While in the case of the interrupt followed by a Tripwire 
FIFO read, the device driver is presented with a l.stof 
endpoints which require attention. This list improves 
system performance as the device driver is not required to 
scan a large number of memory locations looking for such 
endpoints . 

Since the device driver is not required to know where the 
memory locations which have been used for synchronisation 
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are. It is also not required to have any knowledge or take 
part in the application level commionication protocol. All 
communication protocol processing can be performed by the 
application and different applications are free to use 
5 differing protocols for their own purposes, and one device 

driver instance may support a number of such applications. 

There is also a problem connected with programming a DMA 
engine that is addressed by an aspect of the invention. 

10 Conventional access to DMA engines is moderated either by a 
single system device driver, which requires (slow) context 
switches to access, or by virtualisat ion of the registers by 
system page fault, also requiring (multiple) context 
switches. The problem is that it is not safe for a user 

15 level application to directly modify the DMA engine 

registers or a linked list DMA queue, because this must be 
done atomically. In most systems, user applications cannot 
atomically update the DMA qiaeue as they can be descheduled 
at any moment. 

20 

The invention addresses this problem by using hardware FIFO 
50 to queue DMA requests from applications. Each 
application wanting to request DMA transfers sets up a 
descriptor, containing the start address and the length of 

25 the data to be transferred, in its local memory and posts 

the address of the descriptor to the DMA queue, whose 
address is common to all applications. This can be arranged 
by mapping a single page containing the physical address of 
the DMA queue as a write-only page into the address space of 

30 all user applications as they are initialised. 

As soon as DMA work queue FIFO 50 is not empty, local bus 55 
is not busy and the DMA engine in bridge 12 is also, not 
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^„sy «.ster/T„gat/D«. arbiter 40 grants DMA st,t. machine 
I;c.ss to local bus S5. Using th. adare.s posted the 
.pplication in FIFO SO, state machine S^ then uses hriage 1. 
Z read the descriptor in memory 6 into the descriptor blocK 
32 Stat, machine 51 then posts th. start address and 
length information h.ld in hlocH S. into the engine in 



bridge 12 . 



W.en .he DMA process is complete, bridge X2 notifies state 
machine 51 of the completion. The state machine then uses 
• . Hinrk 52 to write back a completion 
data from descriptor block 52 to wr ^ 

descriptor in memory 6. Optionally, an interrupt can also 
aescrxp althouqii a Tripwire may 

be raised on microprocessor 5, although 

T^^ovide this notification early 

already nave cetsn ^^w^ — x ~ 

"1 orLr to minimise the delay bringing the relevant 
application bac. onto microprocessor s - run gueue. Th.s x. 
sho»n later in this document. 

Should ^eue =0 be full, then state machine Si wr^es a 
failure code bacK into the cotrpletion field of the 
descriptor that the application has Just attempted to place 
on the ^eue. Thus th. application does not need to read the 

111 applications can safely share the same hardware posting 
address, ^ no time-consuming virtualisation or system 
device driver process is necessary. 

Should any operation t..e longer than a pr.s.t number of .CI 
cycles, timeout logic el is activated to te~ 
current cycle and return an interrupt through blocK S3. 

«.oth.r aspect of the invention relates to the proto cl 
„hich is preferably used by th. NIC. This protocol uses an 
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address and some additional bits in its header. This allows 
the transfer of variable length packets with simple routines 
for Segmentation and Reassembly (SAR) that are transparent 
to the sending or receiving codes. This is also done 
5 without the need to have an entire packet arrive before 
segmentation, reassembly or forwarding can occur, allowing 
the data to be put out on the ongoing link immediately. This 
enables data to traverse many links without significantly 
adding to the overall latency. The packets may be 

10 fragmented and coalesced on each link, for example between 

the NIC and a host I/O bus bridge, or between the NIC and 
another NIC. We term this cut- through routing and 
forwarding. In a network carrying a large number of streams, 
cut -through forwarding and routing enables small packets to 

15 pass through the network without any delays caused by large 

packets of other streams. While other network physical 
layers such as ATM also provide the ability to perform cut- 
through forwarding and routing, they do so at the cost of 
requiring all packets to be of a fixed small size. 

20 

Figure 8 shows an example of . how this protocol has been 
implemented using the 23 -bit data transfer capability of 
HP's GLINK chipset (serialiser 14 and de-serialiser 16) . PCI 
to local bus bridge 12 provides a bus of 32 address/data 

25 bits, 4 parity bits and 4 byte-enable bits. It also 
provides an address valid signal (ADS) which signifies that 
a burst is beginning, and that the address is present on the 
address/data bus. The burst continues until a burst last 
signal (BLAST) is set active, signifying the end of a burst. 

30 It provides a read/write signal, and some other control 

signals that need not be transferred to a remote computer. 
Figure 8A shows how this protocol is used to transfer an n 
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data word burst 63. The data traffic closely rairrors that 
used on the PCI bus, but uses fewer signals. 

The destination address always precedes each data burst. 
Therefore, the bursts can be of variable size, can be. splxt 
or coalesced, by generating fresh address words, or by 
removing address words where applicable. In the preferred 
e..,odi.ent, sequential data words are destined for 
sequentially incrementing addresses. However, data words 
having sequentially decrementing addresses might also be 
used, or any other pattern of addresses may be used so long 
as it remains easy to calculate. So far as the endpo.nts 
are concerned, exactly the same data is transferred to 

=.n,. locations. The benefits are that packets 

ir^of any size at all, reducing the overhead of sending 
an address; packets can be split (and addresses regenerated 
to continue) by network switches to provide quality of 
service, and receivers need not wait for a complete packet 
to arrive to begin decoding work. 

^iso, the destination address given in the header may be for 
the 'nth' data word in the burst, rather than for the fxrst, 
although using the first data word address is preferred. 

Pigure 8b Shows how the protocol of Figure 8a is transcribed 
onto the G-.XNK physical layer. The first word in any 
packet contains an l8-bit network address. Each word of 63 

and low addresses or data, corresponding to the address/data 
^us. the next 4 bits carry either byte enables or parity 
datl During the address phase, the byte enable field (only 
2 bits of which are available, owing to the limitations of 
is used to carry a 2-bit code indicating read, write 
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or escape packet use. Escape packets are normally used to 
carry diagnostic or error information between nodes, or as 
a means of carrying the Xon/Xoff -style protocol when no 
other data is in transit. The G-LINK nCAV signal 
5 corresponds to the ADS signal of 63; nDAV is active 

throughout the rest of the burst and the combination of nDAV 
inactive and nCAV inactive signals the end of a burst, or 
nCAV active indicates the immediate beginning of another 
burst . 

10 

Figure 8c, shows a xead data burst 65; this is the same as 
a write burst 64, except data bit 16 is set to 0 . On the 
outbound request, the data field contains the network 
address for the read data to be returned to. When the data 

15 for a read returns 66, it travels like a write burst, but is 

signified by there only being one nCAV active (signifying 
the network address) along with the first word. An 
additional bit, denoted FLAG in Figure 8, is used to cafy 
Xon/Xoff sttyle information when a burst is in progress. It 

20 is not necessary therefore to break up a burst in order to 

send an Escape packet containing the Xon/Xoff information. 
The FLAG bit also serves as an additional end of packet 
indicator. 

25 In Figure 8c, 67,68 shows an escape packet; after the 

network address, this travels with 68 or without 67 a 
payload as defined by data bit 16 in the first word of the 
burst . 

30 In a full networked implementation, an extra network address 
word may precede each of these packets . Other physical 
layer or network layer solutions are possible, without 
compromise to this patent application, including fibre 



30 

channel parts (using 8B/10B encoding) and conventional 
networks such as ATM or even Ethernet. The physical layer 
only needs to provide some means of identifying data from 
non-data and the start of one burst from the end of a 

previous one. 

further aspect of the invention relates to the 
distribution of hardware around a network. One use of a 
network is to enable one computer to access a hardware 
device whose location is physically distant. As an example, 
consider the situation shown in Figure 9, where xt xs 
required to display the images viewed by the camera 70, 
(connected a frame-grabber card 69) on the monitor which is. 

r.omt.uter 72. The NIC 73 is programmed 

m cuxii, — — 

from Boot ROM 22 to present the same hardware interface as 
that of the frame-grabber card 69: Computer 72 can be 
running the standard application program as provided by a 
third party vendor which is unaware that system has been 
distributed over a network. All control reads and writes to 
the frame-grabber 69, are transparently forwarded by the NIC 
73 and there is no requirement for an extra process to be 
placed in the data path to interface between the application 
running on CPU 74 and the NIC 73. Passive PCI I/O back-plane 
71 requires simply a PCI bus clock and arbiter a.e.. no 
processor, memory or cache. These functions can be 
implemented at very low cost. 

The I/O buses are conformant to PCI I^ocal Bus specification 
2 1 This PCI standard supports the concept of a bridge 
between two PCI buses. It is possible to program the NIC 73 
to present the same hardware interface as a PCI bridge 
between Computer 72 and passive back-plane 71. Such 
programming would enable a plurality of hardware devices to 
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be connected to back-plane 71 and controlled by computer 72 
without the requirement for additional interfacing software. 
Again, it should be clear that the invention will support 
both general networking activity and this remote hardware 
5 communication, simultaneously using a single network card. 

A circular buffer abstraction will now be discussed as an 
example of the use of the NIC by an application. The 
circular buffer abstraction is designed for applications 

10 which require a producer /consumer software stream 
abstraction, with the properties of low latency and high 
bandwidth data transmission. It also has the properties of 
responsive flow control and low buffer space requirements. 
Fig, 10 shows a system comprising two software processes, 

15 applications 102 and 103, on different computers 100, 101. 

Application 102 is producing some data. Application 103 is 
awaiting the production of data and then consuming it. The 
circular buffer 107, is composed of a region of memory on 
Computer 101 which holds the data and two memory locations - 

20 RDP 106 and WRP 109. WRP 109 contains the pointer to the 

next byte of data to be written into the buffer, while RDP 
106 contains the pointer to the last byte of data to be read 
from the buffer. When the circular buffer is empty, then WRP 
is equal to RDP + 1 modulo wrap-aroxmd of the buffer. 

25 Similarly, the buffer is full when WRP is equal to RDP - 1, 

There are also private values of WRP 108 and RDP ill in the 
caches of computer 100 and computer 101 respectively. Each 
computer 100,101 may use the value of WRP and RDP held in 
its own local cache memory to compute how much data can be 

30 written to or read from the buffer at any point in time, 
without the requirement for communication over the network. 
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«>e„ ..e =i.c.l.. .u«.. 10, is =„.»a, the p~ s,^s 
„p . T.ip«i« 110, which will .«ch Oh a wr.te to the PBP 
^,„,„ 10., and the consumer sets e T.ip«i.e 113, wh.ch 
„ill Mtch on a write to the WRP pointer 103. 

„ consume, application 103 attempts to read data fro» the 
...cular hu...r 10,, it first checKs to see i. - ^ - 
huffer is e-»pty. " - application 103 »ust va.t »nt.I 
h^f er is not e^pty, determined when W.P 10, has heen seen 
To L incremented. Ourin, this waitin. period, application 
11 ™. either hloo., re^estin. an operating system 
reschedule, or poll the MP 10= pointer. 

„ decides to write to the circular 

:lrrTo:: r::rI:"so".hil. the hu«er is not .UH. «ter 
!: tin. sole data, application 10. updates its local cached 
Ta le It «KP 10., and writes the updated value to the memory 

Tdated, the Tripwire 113, will ™toh as has hee. previously 

described. 

X. consumer application 103 is not runnin, on CP. lis „h» 
.ome data is written into the hu«er and Trrpwire 1 
matches, KIC 11= will raise a hardware inter„pt 1 4^ Thrs 
interrupt causes CPU lie to run device drrver software 
miiexitit' device driver 

contained within operating system 118. The de 

service the interrupt hy reading the trrpw.re .IK> 43 
„n »1C lis and deter.ine from the value read, the^yst^ 
identifier for application 103. The device driver c„ then 
r!;.est that operating system lis, reschedule applicatr^ 
or The device driver would then indicate that the tripwire 
Should not generate a hardware interrupt until 
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application 103 has been next de scheduled and subsequently 
another Tripwire match has occurred. 

Note that the system identifier for each running application 
5 is loaded into internal registers 49, each time the 

operating system reschedules. This enables the NIC to 
determine the currently running application, and so make the 
decision whether or not to raise a hardware interrupt for a 
particular application given a Tripwire match. 

10 

Hence, once consumer application 103 is again rionning on the 
processor further writes to the circular buffer 107, by 
application 102, may occur without triggering further 
hardware interrupts. Application 103 now reads data from the 
15 circular buffer 107. It can read data \intil the buffer 

becomes empty (detected by comparing the values of RDP and 
WRP 111,109) . After reading, application 102 will update its 
local value of RDP 111 and finally writes the updated value 
of RDP to memory location 106 over the network. 

20 

If producer application 102 had been blocked on a full 
buffer, this update of RDP 106 would generate a Tripwire 
match 110, resulting in application 102, being unblocked and 
able to write more data into the buffer 107. 

25 

In normal operation, application 102 and application 103 
could be operating on different parts of the circular buffer 
simultaneously without the need for mutual exclusion 
mechanisms or Tripwire. 

30 

The most important properties of the data structure are that 
the producer and the consumer are able to process data 
without hindrance from each other and that flow control is 
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explicit within the .oftware abstraction. Data is streamea 
through the system. The consu™. can re«,ve data fro» the 
buffer at the s.». ti»a as the producer is adding »or. data 
There is no danger of buffer over-run, .ince a producer wrll 
never transmit more data than can fit in the buffer. 

The producer only aver increments WKP 10=, 10. and reads RDP 
.06 and the consumer only ever increments HBP 106, ill, and 
' . • ^ralues of WRP and RDP 

reads WRP 109. Inconsistences xn the values of 

seen by either the producer or consumer either cause the 

1-^ f^^\-Pi (when RDP 106 xs 
consumer to not process some vaL.d data (when 

^ or the producer to not write some 

inconsistent with 111), or un.<= ^ . , ^. 

Le data (when 10= is inconsistent with lOS, , untr e 

„v has been resolved. «"ther of these 

C.US. ^correct operation or perfo^ance 
degradation so Ion, as they sr. transient . 
It should also be noted that on most computer architectur s 
including the .Iph. «P «.d Intel Pentium ranges, computer 
can score the value of the «I.P 106 pointer rn its 
processor cache, since the producer application 103 only 
reads the pointer 106. »ny remote writes to the memory 
l ion Of the pointer 106 will automatical y 

invalidate the copy in the cache causing the new value o b 
fetched from memory. This process is automatically carried 
out and managed by the system controller S. m addition 
since computer 101 .eeps a private copy of the .OP pointer 
own cache, there is no need for any remote reads 
»P pointer values during operation of the circular 
suffer. Similar observations can also be made for the 
pointer 10, in the memory of computer 101 and the «KP 
pointer lOS in the cache of counter 100. This feature of 
.he buffer ebstraction ensures that high performance and low 
atency are maintained. Hesponsive application level flow- 
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control is possible because the cached pointer values can be 
exposed to the user- level applications 102, 103. 

A further enhancement to the above arrangement can be used 
5 to provide support for applications which would like to 
exchange data in discrete units. As shown in Fig. 11, and in 
addition to the system described in Fig. 10. The system 
maintains a second circular buffer 127, of updated WRP 12 9 
values corresponding to buffer 125. This second buffer 127 
10 is used to indicate to a consumer how much data to consume 

in order that data be consumed in the same discrete units as 
it were produced. Note that circular buffer 125 contains 
the data to be exchanged between the applications 122 and 
123 . 

15 

The producer, application 122 writes data into buffer 125, 
updating the pointer WRP 12 9, as previously described. Once 
data has been placed in buffer 125, application 122 then 
writes the new value of the WRP 129 pointer into buffer 127. 

20 At the same time it also manipulates the pointer WRP 131. If 
either of these write operations does not complete then the 
application level write operation is blocked until some data 
is read by the consumer application 123. The Tripwire 
mechanism can be used as previously described, for either 

25 application to block on either a full or empty buffer pair. 

The consumer application 123 is able to read from both 
buffers 125 and 127, in the process updating the RDP 
pointers 133, 135 in its local cache and RDP pointers 124, 
30 126 over the network in the manner previously described. A 
data value read from buffer 127 indicates an amount of data, 
which had been written into buffer 125. This value may be 
used by application level or library software 123, to 
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122. 



..e «= can also ^ -aa direc.ly support a low latency 

St vie of commiinication, as seen i 

r/Lc^.ecc.„ <CC.B« an. «e«o.. .il. Sy...™ ».S, as 
as „ansa«io„al system sue. as .aCa^ase. . Sue. an 

a n Ln. is sKo» Pi.-- » 
CO Je. ..0 acs as a cXian. ..^.sc.n. .a^^. 
.ppl.ca.ion X4, on cc„pu». -.ic. a«s as a sa„e.. TKe 

.rpl.ca.ic„s in«,ac. via ™„o.y .appin.s ns.n. .wo 

^ . and 14S, one =ontain.d in th. main 

circniat -■ - Wf.rs operate as 

™„o.y of each c=.put.r. The circular ^ 
previously described, and also can be confined to transfer 

data in discrete units as previously described. 

.p^iication tbe client, writes a revest 14, diractly 

the Circular buffer l.S, via the »e™cry capped 
:lction,s, , and waits for a reply by w.itin, on data to 
Trrive in circular buffer 144. Most Ha^est/.espons. systa» 
use . process ^own as ^r.hallin, to construct the revest 
Td use an intermediate buffer in memory of the client 
Jcation to do the marshalling, .i.ewise ----- - 

Ted to construct a response, with an intermedrate buffer 

retire, in the ™™cry of the "-r appli^a 
using the present invention, marshallm, can taHe pla 
directly into the oiroular buffer 14= of the server as 
dxrectj.y meanest is necessary 

shown. NO intermediate 'storage of the request 
at either the client or server computers 140, 141. 
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The server application 143 notices the request (possibly- 
using the Tripwire mechanism) and is able to begin 
unmarshalling the request as soon as it starts to arrive in 
the buffer 145. It is possible that the server may have 
5 started to process the request 149 while the client is still 

marshalling and transmitting, thus reducing latency in the 
commxanicat ion . 

After processing the request, the server writes the reply 
146 directly into buffer 144, unblocking application 142 
(using the Tripwire mechanism) , which then unmarshalls and 
processes the reply 148. Again, there is no need for 
intermediate storage, and unmarshalling by the client may be 
overlapped with marshalling and transmission by the server. 

A further useful and novel property of a Request/Response 
system built using the present invention, is that data may 
be written into the buffer both from software running on "a 
CPU, or any hardware device contained in the computer 
system. Fig. 15 shows a Request/Response system which is a 
file serving application. The client application 262 writes 
a request 267 for some data held on disks controlled by 271. 
The server application 263 reads 269 and decodes the request 
from its circular buffer 265 in the manner previously 
described. It then performs authentication and authorisation 
on the request according to the particular application. 

If the request for data is accepted, the server application 
263 uses a two-part approach to send its reply. Firstly, it 
30 writes, into the circular buffer 264, the software generated 
header part of the reply 265. The server application 263 
then requests 273 that the disk controller 271 send the 
required data part of the reply 272 over the network to 
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circular buffer 264. Thi. revest to the dislc controller 
.a^es the for. of a DMA rec^uest. with the target address 
being an address on I/O bus 270 which has been mapped onto 
circular buffer 264. Note that the correct offset is applied 
.o the address such that reply data 272 from the disk is 
placed iTnmediately following the header data 266. 

Before initiating the request 273, the server application 
.e3 can ensure that sufficient space is available in the 
buffer 264 to accept the reply data. Further, it xs not 
necessary for the server application 263 to awaxt the 
completion reguest 273. Xt is possible for the cl.en 
application 262 to have set a Tripwire 274 to match once the 

J -Kn-Fff^-r 264. This matcn 

^^^^ 0-70 H;^^=! been received into bufter ^b^. 

can be programmed to increment the WHP pointer associated 

with buffer 264, rather than requiring application 263 to 

increment the pointer as previously described. If a request 

fails, then the client application 262 level timeout 

mechanism would detect and retry the operation. 

It: is also possible for the client application 262 to 
arrange that reply data 272 be placed in some other data 
structure, (such as a 3cemal buffer-cache page), through 
manipulation of ^S^ and 167 as described later. This .s 
useful when 264 is not the final destination of the rept 
data, so preventing a final memory copy operation by the 
client. server application 263 would be unaware of th.s 
client side optimisation. 

By use of this mechanism, the processing load on the server 
is reduced. The requirement for the server application to 
wait for completion of its dislc requests is removed. The 
requirement for high bandwidth streams of reply data to pass 
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through the server's system controller, memory, cache or CPU 
is also removed. 

As previously stated, the NIC of the present invention could 
5 be used to support the Virtual Interface Architecture (VIA) 
Standard. Fig. 13 shows two applications communicating using 
VIA. Application 152 sends data to application 153, by 
first writing the data to be sent into a region of its 
memory, shown as block 154, Application 152 then builds a 

10 transmit descriptor 156, which describes the location of 
block 154 and the action required by the NIC (in this case 
data transmission) . This descriptor is then placed onto the 
TxQueue 158, which has been mapped into the user-level 
address-space of application 152. Application 152 then 

15 finally writes to the doorbell register 160 in the NIC 162 

to notify the NIC that work has been placed on the TxQueue 
158 . 

Once the doorbell register 160 has been written, the NIC 
20 162 can determine, from the value written, the address in 
physical memory of the activated TxQueue 158. The NIC 152 
reads and removes the descriptor 156 from the TxQueue 158, 
determines from the descriptor 156, the address of data 
block 154 and invokes a DMA 164 engine to transmit the data 
25 contained in block 154. When the data is transmitted 168, 
the NIC 162 places the descriptor 156 on a completion queue 
166, which is also mapped into the address space of 
application 152, and optionally generates a hardware 
interrupt. The application 152 can determine when data has 
30 been successfully sent by examining queue 166. 

When application 153 is to receive data, it builds a receive 
descriptor 157 describing where the incoming data should be 
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placed, in this case block 155. Application 153 then places 
descriptor 157 onto RxQueue 159. which is mapped into its 
user-level addres s- space . Application 153 then writes to 
the doorbell register 161 to indicate that its RXQueue 159 
has been activated. It may choose to either poll xts 
completion queue 163, waiting for data to arrive, or blocK 
until data has arrived and a hardware interrupt generated. 
The NIC 165 in computer 151 services the doorbell register 
161 write by first removing the descriptor 157 from the 
RxQueue 159. The NIC 165 then locates the physical pages of 

^A^r. fr. block 155 and described by the 
memory corresponding to biocjc 

receive descriptor 157. The VIA standard allows these 
physical pages to have been previously locked by application 

memorv system moving or removing 
153 tprevexicxixa -■ 

the pages from physical memory) . However, the NIC is also 
capable of traversing the page-table structures held m 
physical memory and itself locking the pages. 

The NIC 165 continues to service the doorbell register write 
and constructs a Translation l,ook-aside (TX.B) entry 167 
located in SRAM 23. When data arrives corresponding to a 
particular VIA endpoint. the incoming address matches an 
aperture 169 in the NIC, which has been marked as requiring 
a TLB translation. This translation is carried out by state 
machine 46 and determines the physical memory address of 
block 155. 

The T1.B translation, having been previously set up, occurs 
with little overhead and the data is written 175 to 
appropriate memory block 155. A Tripwire 171 will have been 
arranged (when the TI.B 167 entry was constructed) to match 
when the address range corresponding to block 155 is written 
to This Tripwire match causes the firmware 173 (implemented 
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in state machine 51) to place the receive descriptor 157 

onto completion queue 163 to invalidate the TLB mapping 16 7 
and optionally generate an interrupt. If the RxQueue 159 has 
been loaded with other receive descriptors, then the next 
5 descriptor is taken and loaded into the TLB as previously 
described. If application 153 is blocked waiting for data to 
arrive, the interrupt generated will result, {after a device 
driver has performed a search of all the completion queues 
in the system) , in application 153 being re-scheduled. If 

10 there is no TLB mapping for the VIA Aperture addresses, or 

the mapping is invalid, an error is raised using an 
interrupt. If the NIC 165 is in the process of reloading the 
TLB 167 when new data arrives, then hardware flow control 
mechanism 31 is used to control the data until a path to the 

15 memory block in computer 151 has been completed. 

As an optional extension to the VIA standard, the NIC coul'd 
also respond to Tripwire match 171 by placing an index on 
Tripwire FIFO 42, which could enable the device driver to 
20 identify the active VIA endpoint without searching all 
completion queues in the system. 

This method can be extended to provide support for 120 and 
the forthcoming Next Generation I/O (NGIO) standard. Here, 
25 the transmit, receive and completion queues are located on 

the NIC rather than in the physical memory of the computer, 
as is currently the case for the VIA standard. 

As mentioned previously, another aspect of this invention is 
30 its use in providing support for the outbound streaming of 
data through the NIC. This setup is described in Fig. 14. It 
shows a Direct Memory Access (DMA) engine 182 on the NIC 
183, which has been programmed in the manner previously 



42 



described by a nunO^ar of user-level applications 184. These 
applications have requested that the NIC 183 transfer their 
respective data blocks 181 through the NIC 183, local bus 
189 fibre-optic transceiver 190 and onto network 200. After 
each application has placed its data transfer request onto 
the DMA request queue 185. it blocks, awaiting a re- 
schedule, initiated by device driver 187. It can be 
important that the system maintains fair access between a 
large nurr^er of such applications, especially under 
circumstances where an application requires a str.ct 
periodic access to the queue, such as an application 
generating a video stream. 

^v,= r,«fwork bv the DMA engine 182, 

Data trans texx-eci ^v^-v- — - 

traverses local bus 189, and is monitored by the Tripwire 
^it 186. This takes place in the same ■ manner as for 
received data, (both transmitted and received data pass 
through the NIC using the same local bus 55) . 

Each application, when programming the DMA engine 182 to 
transmit a data block, also constructs a Tripwire which is 
set to match on an address in the data block. The address to 
match could indicate that all or a certain portion of the 
data has been transmitted. When this Tripwire fires and 
causes a hardware interrupt 188. the device driver 187 can 
quickly determine which application should be made runnable. 
By causing a system reschedule, the application can be run 
on the CPU at the appropriate moment to generate more DMA 
requests. Because the device driver can execute at the same 
time that the DMA engine is transferring data, this decision 
can be made in parallel to data transfer operations. Hence, 
by the time that a particular application's data transfer 
requests have been satisfied, the system can ensure that the 
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application be running on the CPU and able to generate more 
requests . 



CLAIMS 



1 . A method of synchronising between a sending 
application on a first computer and a receiving 
application on . a second computer, each computer having a 
main memory, and at least one of the computers having an 
asynchronous network interface, comprising the steps of: 

providing the asynchronous network interface with a 
set of rules for directing incoming data to memory 
locations in the main memory of the second computer; 

storing in the network interface one or more 
triggering value (s), each triggering value representing a 
state of a data transfer between the applications; 

=^ fV,^ n^^.twork interface, a data stream 

X'c<-;o-i- V j-xj.^ / — — 

being transferred between the applications ; 

comparing at least part of the data stream received 
with the stored triggering values,- 

if any compared part of the data stream matches any " 
triggering value, indicating that the triggering value has 

been matched; and 

storing the data received in the main memory of the 
second computer at one or more memory location (s) in 
accordance with the said rules. 

2. A method according to claim 1, in which the step of 
providing the asynchronous network interface with a set of 
rules comprises the- step of establishing a mapping between 
information contained within the incoming data stream and 
one or more memory location{s) of the main memory of the 
second computer. 

3. A method according to claim 2, in which the 
asynchronous network interface is a memory mapped network 



interface, and in which the step of providing the memory 
mapped network interface with a set of rules comprises the 
step of establishing a mapping between addresses contained 
within the incoming data stream and one or more memory 
5 location (s) of the main memory of the second computer. 

4 . A method according to any of claims 1 to 3 , further 
comprising storing in the asynchronous network interface 
an action, corresponding to each triggering value, which 

10 is to be carried out, in the event that the triggering 

value is matched, to indicate that the triggering value 
has been matched, 

5. A method according to any of claims 1 to 4 , comprising 
15 the step of sending an interrupt when a triggering value 

matches . 

6. A method according to any of claims 1 to 5, comprising 
the step of changing the value of a counter when a 

20 triggering value is matched. 

7. A method according to any preceding claim in which the 
triggering value (s) comprise (s) address data, and the part 
of the data stream compared with the stored triggering 

25 value (s) comprises address data, 

8. A method according to any preceding claim, wherein the 
step of storing a triggering value is initiated by an 
application on one of the computers writing a triggering 

30 value to a memory location in the local control aperture 
within the address space of the network interface. 
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A method according to any preceding claim comprising 
the steps of accessing the main memory of the sending 
application, and outputting data therefrom. 

10 . A method according to any preceding claim comprising 
the step of mapping each physical destination address of 
the data being sent to a virtual memory address on a 
sending computer. 

11. A method according to any preceding claim, both 
computers having an asynchronous network interface, 
comprising the step of sending the data stream from the 
sending network interface to the receiving network 
interface . . ' . 

12. A method according to claim 11 comprising the step of 
mapping each virtual address of the received data stream 
to a physical address memory location of the main memory " 
of the receiving computer. 

13 . A method according to any preceding claim comprising 
the step of writing the transferred data to the main 
memory of the receiving computer. 

14. A method according to any preceding claim, each 
computer having a network interface also having an I/O 
bus, the method comprising the step of providing the 
network interface with a local bus, and a bridge for 
interfacing between the local bus and the I/O bus of the 
computer . 
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15. A method according to claim 14, comprising the step 
of loading the bridge with predetermined configuration 
data . 

5 16 . An asynchronous network interface, for use in a host 

computer having a main memory and being connected to a 
network, the interface comprising: 

means for storing a set of rules for directing 
incoming data to memory locations in the main memory of 
10 the host computer; 

a memory for storing one or more triggering value (s) , 
each value representing a state of a data transfer between 
two or more applications in the computer network; 

a receiver for receiving a data stream being 
15 transferred between two or more applications in the 

computer network; 

comparison means for comparing at least part of the 
data stream received by the network interface with the 
stored triggering values; and 
20 a memory for storing information identifying any 

matched triggering values. 

17. An asynchronous network interface according to claim 

16, in which the set of rules comprises a memory mapping. 

25 

18. An asynchronous network interface according to claim 
16 or 17, further comprising means for performing an 
action corresponding to a matched triggering value. 

30 19. An asynchronous network interface according to claim 
16, 17 or 18, further comprising a local bus. 



20. An asynchronous network interface according to claim 
19, the host computer having an I/O bus, the interface 
further comprising a bridge for interfacing between the 
I/O bus of the computer and the local bus of the network 
interface. 

21. An asynchronous network interface according to any of 
claims 16 to 20, wherein the comparison means comprises a 
content-addressable memory. 

22. An asynchronous network interface according to claim 
21, wherein the comparison means comprises two or more 
content-addressable memories which are arranged so as to 
conduct a pipelined comparison u£ l-he data strsam recei-^/ed 
by the network interface. 

23. An asynchronous network interface according to any of 
claims 16 to 22, further comprising receive and transmit 
serialisers. 

24. An asynchronous network interface according to any of 
claims 16 to 23, comprising a memory for storing 
configuration data for the bridge. 

25. An asynchronous network comprising two or more 
computers each having an asynchronous network interface 
according to any of claims 16 to 24. 

26. A method of passing data between an application on a 
first computer and remote hardware within a second 
computer or on a passive backplane, the first computer 
having a main memory and an asynchronous network 
interface, the method comprising the steps of: 



providing the asynchronous network interface with a 
set of rules for directing incoming data to memory or I/O 
location (s) of the remote hardware; 

storing in the network interface one or more 
5 triggering value (s), each triggering value representing a 
state of a data transfer between the application and the 
hardware ; 

receiving, at the network interface, a data stream 
being transferred between the application and the 
1 0 hardware ; 

comparing at least part of the data stream received 
with the stored triggering value (s); 

indicating that a triggering value has been matched, 
if any compared part of the data stream matches a 
15 triggering value; 

and, when a data stream is being passed from the 
first computer to the remote hardware, storing data 
received by the remote hardware in memory or I/O 
location (s) of the remote hardware in accordance with the 
20 said rules; and, 

when a data stream is being transferred from the 
remote hardware to the first computer, storing the data 
received in the main memory of the first computer at one 
or more memory location (s) in accordance with the said 
25 rules . 

27. A method according to claim 26, in which the step of 
providing the asynchronous network interface with a set of 
rules comprises the step of establishing a mapping between 
30 information contained within the incoming data stream and 
one or more memory or I/O location (s) of the receiving 
computer or hardware. 
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28. A method according to claim 27, in which the 
asynchronous network interface is a memory mapped network 
interface, and in which the step of providing the memory 
mapped network interface with a set of rules comprises the 
step of the first computer establishing a mapping, either 
locally or remotely, between addresses contained within 
Che incoming data stream and one or more memory or I/O 
location(s) of the receiving computer or hardware. 
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A method according to any of claims 26 to 28, further 
comprising storing in the asynchronous network interface 
an action, corresponding to each triggering value, which 
is to be carried out, in the event that the triggering 
value is matched, to indicate that the triggsring value 
has been matched. 



30. A method according to any of claims 26 to 29, 
comprising the step of sending an interrupt when a 
triggering value matches. 

31. A method according to any of claims 26 to 30, 
comprising the step of changing the value of a counter 
when a triggering value matches. 

32. A method according to any of claims 26 to 31, in 
which the triggering value (s) comprise (s) address data, 
and the part of the data stream compared with the stored 
triggering value (s) comprises address data. 

33. A method according to any of claims 26 to 32. wherein 
the step of storing a triggering value is initiated by an 
application on a computer writing a triggering value to a 



memory location in the local control aperture within the 
address space of the network interface. 

34. A method according to any of claims 26 to 33, 

5 comprising the steps of accessing the main memory of the 

application, and outputting data therefrom. 

35. A method according to any of claims 26 to 34, 
comprising the step of mapping each physical destination 

10 address of the data being sent, to a virtual memory 

address on a computer. 

36. * A method according to any of claims 26 to 35, both 
computers having an asynchronous network interface, 

15 comprising the step of sending the data stream from the 

sending network interface to the receiving network 
interface . 

37. A method according to any of claims 26 to 36, 

20 comprising the step of mapping each virtual address of the 
received data stream to a physical memory address or I/O 
location of the receiving computer or remote hardware. 

38. A method according to any of claims 26 to 37, 

25 comprising the step of writing the transferred data to the 
main memory of the receiving computer. 

39. A method according to any of claims 26 to 38, each 
computer or passive backplane having a network interface 

30 also having an I/O bus, the method comprising the step of 
providing each network interface with a local bus , and a 
bridge for interfacing between the local bus and the I/O 



bus of the computer or passive backplane. 

40. A method according to claim 39, comprising the step 
of loading the bridge with predetermined configuration 
data. 

41. A method according to claim 40, in which the 
configuration data includes configuration data relating to 
the remote hardware. 

42. A method according to any of claims 26 to 4X, each 
computer and/or passive backplane having an I/O bus, the 
method further comprising the steps of: 

loading the network interface of one of Liie 
computer (s) and/or of the passive backplane with data for 
configuring it to capture one or more predefined interrupt 
signal (s) on the I/O bus of that computer or passive 
backplane; 

transferring a captured interrupt signal over the 
network to a network interface of another computer or 
passive baclqplane; and 

loading the network interface of one of the 
computer (s) or of the passive backplane to assert one or 
more predefined interrupt signal(s) on the I/O bus of that 
computer or passive backplane, on receipt of the said 
transferred captured interrupt signal. 

43 . A method of arranging data transfers from one or more 
applications on a computer, the computer having a main 
memory, an asynchronous network interface, and a Direct 
Memory Access (DMA) engine having a request queue address 
common to all the applications, comprising the steps of: 
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the application requesting the network interface to 
store one or more triggering value (s) corresponding to a 
data block to be transferred; 

an application requesting the DMA engine to transfer 
5 a block of data; 

the network interface storing one or more triggering 
value (s) corresponding to the data block to be. 
transferred, along with an identification of the 
application which recjuested the DMA transfer; 
10 the network interface monitoring the data stream 

being sent by the applications and comparing at least part 
of the data stream with the triggering value (s) stored in 
its memory; and 

if any triggering value matches, indicating that that 
15 triggering value has matched. 

44. A method according to claim 43, in which the 
application requests a DMA transfer by setting up a 
descriptor indicating the transfer required, and sending 

20 this descriptor to the DMA request queue address, 

45. A method according to claim 43 or 44, in which after 
requesting a data transfer and storage of a triggering 
value, the application blocks until it receives a 

25 reschedule. 

46. A method according to claim 43, 44 or 45, in which 
when a triggering value matches, a reschedule is sent to 
the application which requested the storage of that 

30 triggering value. 

47. A method according to any of claims 43 to 46, in 
which, if the request queue is full when an application 
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atteinpts to add a new request, the network interface 
indicates to that application that its requested transfer 
has failed. 

48 A n^ethod according to any of claims 44 to 47, further 
comprising the steps of reading the first descriptor in 
Che request queue and retrieving data from the main memory 
of the computer in accordance with the contents of the 

descriptor. 

49 A method according to claim 48, further comprising 
Che step of transmitting the data retrieved from the main 
memory in accordance with the content of the corresponding 
descriptor. 
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A method according to any of claims 43 to 49, further 
comprising the step of interrupting the transfer of a data 
block if the transfer is not completed after a 
predetermined length of time from the start of that 
transfer. 

51 A method of transferring data from a sending 
application on a first computer to a receiving application 
on a second compv^ter, each computer having a main memory, 
and a memory mapped network interface, the method 
comprising the steps of: 

creating a buffer in the main memory of the second 
computer for storing data being transferred as well as 
data identifying one or more pointer memory location (s) ; 

storing at said pointer memory location (s) at least 
one write pointer and at least one read pointer for 
indicating those areas of the buffer available for writes 
and for reads; 
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in dependence on the values of the WRP(s) and RDP(s), 
the sender application writing to the buffers- 
updating the value of the WRP(s), after a write has 
taken place, to update the indication of the area(s) of 
the buffer available for reads and the area(s) available 
for writes; 

in dependence on the values of WRP(s) and RDP(s), the 
receiver application reading from the buffer; and 

updating the value of the RDP(s), after a read has 
taken place, to update the indication of the area(s) of 
the buffer available for reads and the areas (s) available 
for writes. 

52. A method according to claim 51, in which the step of 
updating the value of the WRP(s) includes the sending 
application sending the updated value of the WRP to the 
main memory of the second computer, via the network. 

53. A method according to claim 51 or 52, in which the 
first computer comprises a processing means with a cache 
memory, comprising the step of the sending application 
storing the value of the updated WRP in the cache memory. 

54. A method according to claim 51, 52 or 53, in which 
the step of updating the value of the RDP(s) includes the 
receiving application sending the updated value of the RDP 
to the main memory of the first computer, via the network. 

55. A method according to claim 51, 52, 53 or 54, in 
which the second computer comprises a processing means 
with a cache memory, the method comprising the step of the 
receiving application storing the value of the updated RDP 
in its cache memory. 
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56. A method according to any of claims 51 to 55, 
comprising the steps of: 

the network interface of the second computer storing 
triggering value (s) corresponding to the address (es) of 
one or more write pointer (s) (WRPCs)); 

the network interface of the second computer 
monitoring the data stream received from the first 
computer and comparing at least part of the data stream 
with the triggering value (s) stored in its memory; and 

if any triggering value matches, indicating that that 
triggering value has matched. 

57. A method according to claim 56, in which when a 
triggering value is maccnea oy trxx= ^^^-^^ir- — 

write instruction, a receiver interrupt is generated. 

58. A method according to any of claims 51 to 57, further 
comprising the steps of: 

providing a second buffer in the main memory of the 
second computer for storing write pointer data,- 

■ storing one or more second-buffer write pointer (s) 
arid second-buffer read pointer (s) indicating the areas of 
the second-buffer available for writes and reads; 

when the sending application writes to the first- 
buffer and updates the write pointer (s) of the first- 
buffer, writing to said second-buffer, in accordance with 
the value of the write pointer (s) and read pointer (s) of 
the second-buffer, the updated value of the write pointer 
of the first-buffer; and 

updating the value of the second-buffer write 
pointer(s) to update the indication of the area(s) of the 
second-buffer available for writes and the areas (s) 
available for reads. 
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59. A method according to claim 58, further comprising 
the steps of : 

reading a first -buffer write pointer value from the 
second buffer, in dependence on the contents of the 
5 second-buffer read pointer (s) and second-buffer write 
pointer(s), and 

reading from the first buffer in dependence on the 
value of a first-buffer pointer and the write pointer 
value read from the second buffer. 

10 

60. A method according to any of claims 51 to 59, further 
comprising the steps of: 

the network interface of the first computer storing 
triggering value (s) corresponding to address (es) of one or 
15 more RDP (s) ; 

the network: interface of the first computer 
monitoring the data stream received from the second 
computer and comparing at least part of the data stream 
with the triggering value (s) stored in its memory; and 
20 if cuiy triggering value matches, indicating that that 

triggering, value has matched. . 

61. A method according to claim 60, in which when the 
network interface of the first computer matches a 

25 triggering value by the receipt of an RDP write 

instruction, a sender interrupt is generated. 

62. A method according to any of claims 51 to 61, in 
which the sending application blocks if the values of the 

30 WRP(s) and RDP(s) indicate that the buffer is full. 

63. A method according to claim 62, in which the sending 
application is unblocked on receipt of an interrupt. 



64. A method according to any of claims 51 to 63, in 
which the receiving application blocks if the values of 
the WRP(s) and RDP(s) indicate that the buffer is empty. 

65. A method according to claim 64, in which the 
receiving application is unblocked on receipt of an 
interrupt . 

66. A method according to any of claims 51 to 65, in 
which a write pointer of a buffer points to the buffer 
address where the next byte of data should be written in 
that buffer. 

67. A method according co any ox olaiTriS Gl to 6£, m 
which a read pointer of a buffer points to the buffer 
address of the first byte of data to be read from that 
buffer. ^ 

68. A method according to any of claims 51 to 67, m 
which when an application has written to the end of a 
buffer, it next writes to the start of the buffer, 
depending on the value of the WRP(s) and RDP(s) 
corresponding to that buffer. 

69. A method according to any of claims 51 to 68, in 
which when an application has read to the end of a buffer, 
it next reads from the start of the buffer, depending on 
the value of the WRP(s) and RDP(s) corresponding to that 
buffer. 

70. A method according to .any of claims 51 to 70, m 
which the value of one or more WRPs and/or RDPs is updated 
when a triggering value is matched in a network interface. 
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71. A computer network comprising two computers, the 
first computer running a sending application and tlie 
second computer riinning a receiving application, each 
5 computer having a main memory and a memory mapped network 
interface^ the main memory of the second computer having: 
a buffer for storing data being transferred between 
computers as well as data identifying one or more pointer 
memory location (s) ; 
10 means for reading at • least one write pointer (WRP) 

and at least one read pointer (RDP) stored at (a) pointer 
memory location(s), for indicating the areas of the buffer 
available for writes and the area(s) available for reads; 

15 the network interface of the second computer 

compr i s ing : 

a memory mapping; 

means for reading data from the buffer in accordance 
with the contents of the WRP(s) and RDP(s); and 
20 means for updating the value of the RDP{s) after a 

read has taken place, to update the indication of the 
areaCs) of the buffer available for reads and the area{s) 
available for writes, 

25 72. A computer network according to claim 71, the network 
interface of the first computer comprising: 
a mapping memory; and 

means for sending data to the buffer of the second 
computer. 

30 

73, A computer network according to claim 71 or 72, the 
main memory of the second computer storing the value of at 
least one WRP. 
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A computer network according to claim 71, 72 or 73, 
in Which one or more pointer memory location (s) are in the 
main memory of the first computer. 

75 A computer network according to any of claims 71 to 
74, in which one or more pointer memory location (s) are 
located in the main memory of the second computer. 

76 A computer network according to any of claims 71 to 
75 in which the first computer comprises a processing 
melns with a cache memory, with one or more V^P(s) and/or 
RDP(s) stored in that cache memory. 

claims 71 to 

77 A computer networK accoicixxxy 

76 in which the second computer has a processing means 
with a cache memory, with one or more mP(s) and/or RDP(s) 
stored in that cache memory. 

78 A computer network according to any of claims 71 to 
77, in which the network interface of the first computer 

comprises: 

means for writing data to the buffer in accordance 
with the values of at least one RDP and one WRP, using its 

memory mapping; and 

means for updating the value of the WRP(s) to update 
the indication of the area(s) of the buffer available for 
reads and the area(s) available for writes. 



79 A computer network according to any of claims 71 to 
7 8 in which the main memory of the second computer 
comprises a second buffer; and the computer network also 

having: 
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means for reading one or more write pointer (s) and 
one or more read pointer Cs) of the second buffer 
indicating the areas of the second buffer available for 
writes and those available for reads; 
5 means for updating the write pointer (s) of the first 

buffer, when an application rxinning on one of the 
computers writes to the first buffer; 

means for writing to said second buffer, in 
accordance with the value of the write pointer (s) and read 
10 pointer (s) of the second buffer, the updated value of the 
write pointer of the first buffer; and 

means for updating the value of the second buffer's 
write pointer (s) to update the indication of the area(s) 
of the second buffer available for reads and the area(s) 
15 availale for writes. 

80. A computer network according to claim 79, further 
comprising means for storing one or more write pointer (s) 
of the second buffer indicating the areas of the second 

20 buffer available for reads and the area{s) available for 
writes . 

81. A computer network according to any of claims 51 to 
80, in which the first and/or second buffer is a circular 

25 buffer. 

82. A computer network according to claim 79, 80 or 81, in 
which the network interface of the second computer also 
comprises : 

30 means for reading a first -buffer WRP value from the 

second buffer in accordance with the values of the second- 
buffer WRP(s) and RDP(s); 



clr^D/c:^ r>f the sscond buffer to 
means for updating the RDP(s) of ttie s 

a-roas of the second buffer 
update the indication of the areas 

available for reads and writes ; 

consents of «ir.t-buff„ »P -X™ .e.d f«™ 

.ha second .=u£fe., .nd . first-buffer KDP; .nd 

^,„s for updating t.e vaXue of the .DP<s, o th, 
firsr buffer to update the indication of the .reals, of 
the first buffer available for reads and writes when an 
application runnin. on the second computer reads fro. the 
first buffer. 

33 A computer networlc according to any of claims 51 to 

. .... one or both computers also 

82, the netwuxT. ^.w.- - 

comprising : 

a memory for storing triggering value (s), 

^ o,q^T-f><5s (es) of WRP(s) and/or 
corresponding to one or more address (es) o 

"'^^^Jeans for monitoring a data stream bein. transferred 
between the two computers and for comparing at least part 
of the data stream being transferred with the stored 
i-Y-iaaering value (s) ; and 

„eans for indicating that a tri..eri„, value has 
etched. When the part of the data stream being compared 
TOCohes a triggering value. 

. computer networlt according to clai™ S3, in which 
the .eans for indicating that a triggering value has been 
etched comprises means for generating an interrupt. 

,5 A method of sending a request from a client 

. « server application on 

application on a first computer to a server PP 

a second computer, and sending a response from the server 
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application to the client application, both computers 
having a main memory and a memory mapped network 
interface, the method comprising the steps of: 

(A) providing a buffer in the main memory of each 
5 computer; 

(B) the client application, providing software stubs 
which produce a marshalled stream of data representing the 
request ; 

(C) the client application sending the marshalled 
10 stream of data to the server's buffer; 

(D) the server application unmarshalling the stream 
of data by providing software stubs which convert the 
marshalled stream of data into a representation of the 
request in the server's main memory; 

15 (E) the server application processing the request and 

generating a response; 

(F)the server application providing software stubs 
which produce a marshalled stream of data representing the 
response; 

20 (G) the server application sending the marshalled 

stream of data to the client's buffer; and 

(H) the client application xmmarshalling the received 

stream of data by providing software stvibs which convert 

the received marshalled stream of data into a 
25 representation of the response in the client's main 

memory . 

86. A method according to claim 85 in which in step (c) 
and/or step (g) the stream of marshalled data is sent 

30 according to the method of any of claims 51 to 70. 

87. A method according to claim 85 or 86, comprising the 
step of the client and server stiibs sending the marshalled 



screams of data directly over the network, using the 
memory mapped network interfaces. 

88 A method according to claim 85, 96 or 87, in which 
the sending and/or marshalling of a response by the server 
application may take place at the same time as the clxent 
application is unmarshalling the response from its buffer. 

89 A method according to any of claims 85 to 88, in 
which the sending and/or marshalling of a request by the 
Client application may take place at the same time as the 
server application is unmarshalling the request from xts 
buffer. 

90 A method according to any of claims 85 to 89, in 
which Che response generated by the server application 
comprises two or more parts; 

the server application providing software stubs whxch 
convert at least a first part of the response into a 
marshalled stream of data; 

the server application sending the marshalled data 
stream representing the first part of the response to the 

client's buffer; 

one or more parts of the response being provided by a 
hardware device in the server computer in the form of a 
marshalled stream of data; and 

the hardware device sending its marshalled stream of 

data to the client's buffer. 

91 A method according to claim 90, in which one or more 
parts of the response generated by the server application 
is provided by another software application running on the 
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second computer in the form of a marshalled stream of 
data; and 

the software application sending its marshalled 
stream of data to the client's buffer. 

5 

92. A method according to claim 90 or 91, in which each 
part of the response is sent to an appropriate part of the 
client's buffer such that when all parts of the response 
have been received in the buffer, the contents of the 

■ 10 buffer comprise a marshalled data stream representing the 
whole response from the server application. 

93. A method according to any of claims 85 to 92, 
comprising the steps of: 

15 the network interface of the first computer storing 

triggering value (s) corresponding to a property of one or 
more parts of the expected response; 

the network interface of the first computer 
monitoring the response received from the server 
20 application and comparing at least part of the data stream 
with the triggering value (s) stored in its memory; and 

if any triggering value matches, indicating that that 
triggering value has matched. 

25 94. A method according to claim 93, comprising the step 

of sending an interrupt when a triggering value matches. 

95. A method according to claim 93 or 94, comprising the 
step of changing the value of a counter when a triggering 
30 value is matched. 



96. A method according to claims 93, 94 or 95, in which 
the client application, while it is waiting for the 



...pons. £r«» Che server ,pplic.tio„, bloc.s or pells » 

event counter. 

37 A method of arranging data for transfer as a data 
burst over a computer network comprising the steps of: 
providing a header comprising the destination address of a 
certain data word in the data burst, and a signal at 
beginning or end of the data burst for indicating the 
start or end of the burst, the destination addresses 
other words in the data burst being inferrable from the 
address in the header. 

53 A method according to claira 52, in which the signal 

=. v,„vst comprises a null S3.gnal . 

identifying c!x<= — - - - 

5,. A method of processing a data burst received over a 
computer network comprising the steps of = 

reading a reference address from the header of the 

data burst, and 

. calculating the addresses of each data word .n the 

. . f ->,=.i- rtata word in the burst in 
burst from the position of that data wora 

.elation to the position of the data word to which the, 
address in the header corresponds , and from the reference 
address read from the header. 

XOO A method of interrupting transfer of a data burst 
over a computer network comprising the steps of: 

halting transfer of a portion of the data burst which 
^as not yet been transferred, thereby splitting the data 
burst into two burst sections, one which is transferred, 
and one waiting to be transferred. 
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101. A method of restarting transfer of a data burst 
that has been interrupted according to the method of claim 
100, comprising the steps of: 

calculating a new reference address for the 
5 untransf erred data burst section from the address 

contained in the header of the whole data burst, and from 
the position in the whole data burst of the first data 
word of the untransf erred data burst section in relation 
to the position of the data word to which the address in 
10 the header corresponds ; 

providing a new header for the \intransf erred data 
burst section comprising the new reference address; and 
transmitting the new header along with the untransf erred 
data burst section. 

15 

102. A method according to claim 101, comprising 
calculating the new reference address for the 
untransf erred data burst section from the reference 
address contained in the header of the whole data burst 

20 and from the number of data words in the transferred data 
burst section. 

103. A memory mapped network interface substantially as 
described herein with reference to Figures 5 to 15. 

25 

104 . A computer network comprising two or more computers 
having a memory mapped network interface substantially as 
described herein with reference to Figures 5 to 15. 

30 105. A method of synchronising between a sending 

application on a first computer and a receiving 
application on a second computer, both computers having a 
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memory capped networlc interface, substantially as 
described herein with reference to Figures 5 to 15 . 

,06. A protocol substantially as described herein with 
reference to Figure 8. 

X07 A method of arranging data for transfer 
substantially as described herein with reference to 

Figures 5 to 15. 

X08 A method of processing data substantially as 
described herein with reference to Figures 5 to 15. 

, interrupting transfer of a data burst 

'sistantially as described herein with reference to 
Figures 5 to 15 . 

,,0 A method of restarting transfer of a data burst 
substantially as herein described with reference to 
20 Figures 5 to 15 - 

method of arranging data transfers from one or 
.ore applications on a computer, the computer having a 
.ain memory, a memory mapped network interface, and a 
25 Direct Memory Access (DMA) engine having a request ^eue 
address common to all the applications, the method be.ng 
substantially as described herein. 

112 A method of transferring data from a first 
30 application on a first computer to a second application on 
a second computer, the method being substantially as 
described herein with reference to Figure 10. 
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113. A method of passing data between an application on a 
first computer and remote hardware within a second 
computer or on a passive backplane, the method being 
substantially as described herein. 

114. A method of sending a request from a client 
application to server application and sending a response 
from the server application to the client application, the 
method being substantially as herein described herein. 
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